BBC RD 1991/14 



<$> 



BEE 



Research 
Department 

Report 



HDTV SOUND: 

Programme production developments 

D.J. Meares, B.Sc, C.Eng., F.I.O.A., M.I.E.E. 



Research Department, Engineering Division 
THE BRITISH BROADCASTING CORPORATION 



BBC RD 1991/14 



HDTV SOUND: PROGRAMME PRODUCTION DEVELOPMENTS 
D.J. Meares, B.Sc, C.Eng., F.I.O.A., M.I.E.E. 

Summary 

The success or otherwise of multichannel sound with HDTV will depend to great 
extent on the programme makers being able sensibly to exploit the sound system in order 
to enhance the listening/viewing experience. This Report documents the BBC's programme 
experiments over a period of about two years and presents the conclusions of that work. 



Index terms: Sound; multichannel systems; HDTV 



Issued under the Authority of 

Research Department, Engineering Division, 

BRITISH BROADCASTING CORPORATION Head of Research Department 

(S-4) 1991 



© British Broadcasting Corporation 

No part of this publication may be reproduced, stored in a 
retrieval system, or transmitted in any form or by any 
means, electronic, mechanical, photocopying, recording, 
or otherwise, without prior permission. 



HDTV SOUND: PROGRAMME PRODUCTION DEVELOPMENTS 
D.J. Meares, B.Sc, C.Eng., F.I.O.A., M.I.E.E. 

1. An Overview 1 

2. On-site Recording 1 

3. Post- production Activities 4 

4. Exploitation of Multichannel Sound 6 

5. Compatibility Issues 8 

6. Conclusions 10 

7. Acknowledgements 10 

8. References 10 

Appendix 1: Compatibility Matrixing 12 

A1.1 Introduction 12 

A1.2 Downwards compatibility 12 

Appendix 2: Downward Mixing of Multichannel Audio Signals 16 

A2.1 Introduction 16 

A2.2 The proposal 16 



(S-4) 



© BBC 2006. All rights reserved. Except as provided below, no part of this document may be 
reproduced in any material form (including photocopying or storing it in any medium by electronic 
means) without the prior written permission of BBC Research & Development except in accordance 
with the provisions of the (UK) Copyright, Designs and Patents Act 1988. 

The BBC grants permission to individuals and organisations to make copies of the entire document 
(including this copyright notice) for their own internal use. No copies of this document may be 
published, distributed or made available to third parties whether by paper, electronic or other means 
without the BBC's prior written permission. Where necessary, third parties should be directed to the 
relevant page on BBC's website at http://www.bbc.co.uk/rd/pubs/ for a copy of this document. 



HDTV SOUND: PROGRAMME PRODUCTION DEVELOPMENTS 

D.J. Meares, B.Sc, C.Eng., F.I.O.A., M.I.E.E. 



1. AN OVERVIEW 

If HDTV sound is to introduce multichannel/ 
surround sound presentations, multiple languages and 
other options 1,2,3 into the television environment, then 
it will inevitably make an impact on the normal way 
of working for television programme companies. It 
will be shown that that impact need not necessarily be 
too large but there will be areas where changes are 
needed. 

In the studio, very few programmes currently 
rely on a single mono or stereo microphone 4 . 
Presenters and panellists would normally be provided 
with a personal microphone each, while orchestras are 
normally recorded with a multiplicity of spot micro- 
phones, coincident pair main microphones and ambi- 
ence microphones. Different mixing techniques have 
already been shown to be capable of taking these exist- 
ing microphone arrangements and producing acceptable 
surround sound programme balances 5 . As will be seen 
later, however, slightly different microphone practices 
can make better surround presentations. 

The sound mixing room is where most of the 
change may well be required, but those changes are 
likely to be concentrated in a few additional or 
alternative facilities rather than mammoth global 
changes. It will be seen that new sound panning 
facilities will be needed on the sound desk, as well as 
changes to the monitoring arrangements. Equally, it 
should be stated that many programmes have already 
been produced with virtually no changes to the 
existing stereo facilities. So the extent of the changes 
could be based on the extent of penetration of the new 
services into the market. 

The only major production commitment that 
needs to be changed if these new services are to be 
introduced, is the number of sound channels to be 
recorded and routed with the video signal. With the 
ingress of digital audio into the fabric of the 
broadcasting infrastructure 6 , multiplexed digital audio 
routeing will be little or no problem, but there will 
always be the problem of recording the extra sound 
signals, until HDTV recorders with extra sound tracks 
become the norm. 



2. ON-SITE RECORDING 

The on-site recording problems vary enorm- 
ously depending on the venue. Apart from the 



'routine' problems of setting up the recording 
infrastructure, the success or otherwise of the 
programmes has centred around having sufficient 
microphones in the right places to enable the post- 
production mixing to take place. This has also 
determined whether real or artificial 'sound fields' can 
be created in the listening environment. 

The examples given below, whilst not recorded 
solely for a single specific surround sound format, 
were mixed initially to a 3/2 format. Whilst other 
formats have been assessed, the production comments 
below, except where stated, are based mainly on the 
experience of the 3/2 presentation, which is claimed 
by many to have most potential without making 
excessive demands on producers or consumers. 

Recordings at the Wimbledon Tennis Champion- 
ships in 1989 and 1990 were planned from the start 
to present a sound picture that was as simple as 
possible. The concept was that, as the pictures would 
concentrate on one basic view of the Centre Court, i.e. 
down the centre line of the court, the sound 
impression should try to match this as closely as 
possible. To this end a sound field microphone (SFM), 
placed on the centre line at the boundary between 
court and spectators' stand, was used as the main feed, 
with spot microphones (mostly gun microphones) used 
as fillers: Fig. 1 shows the arrangement used. Even 
with this venue, time delays across the court were 
problematic and the single gun microphone placed at 
the far end of the court had too great a delay to be 
used; for far-court effects the gun microphone on the 
umpire's chair was found to be sufficient. (In this 
situation of widely-spaced moving sound sources and 









A 






c 




















a. 





A= sound field microphone 

B = gun microphone, Sennheiser 416 and 816 

Fig. I - Wimbledon: Centre Court. 



(S-4) 



distant gun microphones, it was not possible to 
prescribe appropriate compensatory audio delays, as 
would have been the practice for orchestral recordings.) 
The mix obtained was intimate and most enjoyable 
and, perhaps because of the inter-channel relationships 
generated by the simple microphone arrangement, was 
relatively easy to matrix down from surround to three 
channel and stereo reproduction formats. 

The FA Cup Final at Wembley 1989 was a 
much more difficult environment to record in. It is 
vast by comparison to the Centre Court; it has a huge 
crowd of 82,000, giving rise to substantial swings in 
both sound level and sound location as first one 
section of the crowd reacts and then another. Amidst 
all this, the recording engineer still has to try to pick 
up the sound of the ball being kicked somewhere on 
the pitch. Attempts were made to deduce in advance a 
good location for the SFM, bearing in mind that 
nothing could be suspended above the pitch. (See 
Fig. 2.) In the event, the SFM output was so heavily 
biased towards the nearest section of the crowd that it 
was unusable. For this recording, gun microphones on 
both the crowd and the pitch effects were used 
exclusively. The aim at the mix down was to create 
an exciting though, admittedly, 'unreal' sound event, 
by surrounding the listener with crowd effects. This 
was achieved successfully, though the perspective on 
the crowd was as though the listener had been 
reserved a large block of seats entirely to himself, so 
that no-one else was within several feet of him. 

For a recording of 'The Prince of the Pagodas', 
at The Royal Opera House, Coven t Garden in 
London, the normal stereo microphone arrangement, 
as shown in Fig. 3, was used. This comprised a 




BD 


Pit: 


BD 


«6 

> ::;: < ; #sM 


stage BD 


ft 


BD 


44 


BD 


4*> 


A=C422 {mam pair) 


B = PCC160 (PZM) 




A = sound field microphone 
others = Sennheiser 816 

Fig. 2 - Wembley Stadium. 



Others = mix of U87, KM84,C414,C460 
and B 8i K 4006 



Fig. 3 - The Royal Opera House. 

combination of a main stereo pair over the proscenium 
arch, spot microphones on the various sections of the 
orchestra in the pit and pressure zone microphones on 
the front edge of the stage. The principal aim was to 
capture the orchestral layout using the stereo pair, 
supplementing this with spot microphones and stage 
microphones as necessary. For this venue it is not 
acceptable to have front-of-house microphones, other 
than the discreetly placed stereo pair. It was the 
intention of the mixing session to simulate the 
ambience, using artificial reverberation fed with the 
existing microphone feeds; this was probably the best 
arrangement for other reasons, as the hall was 
relatively dry and front-of-house microphones would 
inevitably have picked up audience noises (see later 
under Royal Albert Hall). The main orchestral 
acoustic picture was given by the stereo microphones 
panned three quarters left of centre and three quarters 
right of centre. (Though this did not present any great 
problems, it was felt, at the time of the remix, that a 
triple microphone arrangement specifically for the 
three front channels, or a SFM suitably decoded, 
would have been better.) To this was added sufficient 
signal from the spot microphones in the orchestra pit, 
to enhance the presence of the orchestral sounds. The 
stage microphones were not ultimately required as 
there were sufficient stage effects on the main 
microphones. In generating the rear channel feeds, 
additional delay was introduced into the main 
microphone feeds and artificial reverberation, fed from 
a mix of the spot microphones, was added. This 
produced a rather subtle surround mix, although not 
one totally free of delay effects. (Better use of 
compensatory delays, in the initial mix fed to the 
reverberator, may have avoided this.) One other point, 
discovered during the mixing, was that a shortage of 
tracks (only 24!!) at the recording venue had led to 
sub-mixes being recorded for a number of orchestral 



(S-4) 



sections. These were normal stereo sub-mixes, whereas 
a three channel sub-mix would have been more useful 
in post-production. 

Several recordings have now been made at the 
Royal Albert Hall with improvements taking place on 
each successive visit. The last one was the most 
ambitious in its post-production intentions and there- 
fore the most revealing as a pointer to the future. As 
shown in Fig. 4, a multiplicity of microphones was 
used for the sound pick up, with equalising delays 
imposed on the spot microphones to effect approximate 
co-timing. The main orchestral 'picture' was created 
by suitable panning of the five omnidirectional 
microphones suspended in a curtain over the front 
ranks of the orchestra. Spot microphones were mixed 
into this frontal presentation to enhance contributions 
from the more distant orchestral sections. Finally, the 
hall microphones were panned to the sides and rear to 
create the surround sound. The mix created a well 
defined frontal sound stage, with natural reverberation 
mixed to create a realistic sound field. Not only were 
sources found to feed the rear channels in isolation, 
but additional feeds, appropriately timed, were mixed 
half way down each side of the listening area to fill in 
the hole-at-the-sides. In surround sound this worked 
extremely well, giving a very real impression. The 
problem was however that it made very poorly- 
compatible stereo and three channel presentations, 
simply because of the amount of real audience noise 
that was then overlaid on top of the orchestra. The 
only solution, using the available prerecorded tracks, 
was to create an artificial ambience, as with the 
Co vent Garden recording, using the orchestral micro- 
phone signals to feed the artificial reverberator. The 
ultimate for such a recording would be to use a 
combination of the two, with real ambience being 
used as much as possible, e.g. before and after the 
music, and artificial ambience during the concert when 
audience noises become disturbing. This would also 
enable full exploitation of the real hall sounds on 
those occasions where the Promenaders join in. 

One final production experiment which is 
worthy of note was a joint BBC/IRT/ WDR experi- 
ment in Kdln*. This used, as its source, a multitrack 
recording of the BBC programme Horse of the Year 
Show, which covers a group of show jumping 
competitions. This was interesting, primarily because 
the recording was made with no thought of surround 
sound; the microphones were rigged entirely for the 
transmission of stereo programmes. The layout of 
microphones at the stadium is shown in Fig. 5, with 
eight gun microphones suspended high above the 
arena and additional stereo microphones directed at 
one section of the audience. The intention of the 

This recording and mixing experiment is the subject of further 
discussion in the companion Report 1 . 



A = stereo pair, egC414,C422 B* soloist, C414 

C = BaKomni Others = U87,KM84,VR62 




Fig. 4 - Royal Albert Bali 



audience 



* 

■o 
o 


\ n I I 

Q Q Q Q 
2 /> 4 6 \ 8 

$ \\ 

arena >-r 

i i i 3 n 5 v 


a 

c 

Q_ 
o 


A = car 
Others 


"00- 

A 

audience 

dioid, AKG460 T_ 
= gun,MKH416 1 


fences 



Fig. 5 - Horse of the Year Show. 

programme mix was to present a stable wide frontal 
presentation of the action in the arena, with crowd 
noises and reactions coming from all round the 
listener. There were however, two problems in 
carrying this out. Firstly, there was so much sound 
energy in the arena microphone signals, coming from 
the stadium public address and crowd, that a great 
deal of gain riding of the arena microphones was 
necessary in order to pick out the horse and rider 
effects. This is however normal for a stereo production. 
The second point is one where much more work is 
required to find an optimum solution, and that is the 
normal tendency to cut between vastly different 
camera positions to give the optimum visual presenta- 
tion whilst, of necessity, retaining a fixed aural 
perspective and view of the events. These points apart, 
the sound presentation was a success, despite the 
absence of any advance planning for the surround 
sound. 



(S-4) 



-3- 



This last programme, and an orchestral 
recording by WDR, were used during the same 
experiments to enable different sound presentations 
and number of channels to be compared. Of particular 
importance were the separate comparisons of front 
channel presentations and rear channel presentations. 
In brief, the general conclusion was that there was a 
law of diminishing returns applying to both. When 
properly mixed/balanced sounds were used, three 
front channels/loudspeakers gave a much more 
enjoyable presentation than two, but four was not a 
great improvement on three. Similarly for the 
surround sounds, a single additional channel gave 
much worse realism than two, but four was not 
significantly better than two. This certainly supports 
the BBC's previous findings on reproduction formats. 



3. POST-PRODUCTION ACTIVITIES 

Whilst it can be concluded therefore that for 
many productions the sound pick-up problems of 
surround sound can be overcome, what about the 
potential difficulties in the control rooms and the need 
for changes in techniques and technology? 

Two post-production environments have been 
used extensively by the BBC for its sound mixing 
experiments: they are Sypher 2 and the control room 
of the Music Studio (TMS), both at BBC Television 
Centre in London, see Figs. 6, 7 and 8. Both areas are 
relatively cramped, particularly for surround sound; 
both areas are full of technical equipment and need 
careful rigging of the stereo facilities in order to make 
the multichannel mixes. However, it must be borne in 



mind that both areas are in many ways typical of 
television sound areas, except that they are larger than 
most, at least in the BBC. This is a significant point to 
remember when deciding how HDTV programmes 
will be balanced in the future. There are probably very 
few broadcast studios or outside broadcast facilities 
that could easily accommodate multichannel or 
surround sound monitoring, without at least a degree 
of reorganisation. (By way of contrast, see Ref. 7.) 

However, by careful adaptation it has been 
possible to use both Sypher 2 and TMS successfully, 
though in each case at least half a day was needed to 
effect the necessary reorganisation of the stereo 
facilities. Particularly time consuming, were the rigging 
and balancing of the loudspeakers and the plugging/ 
switching of the stereo desk routeing system to enable 
surround sound processing to be achieved. 

Absolutely essential in the rigging of the 
loudspeakers, is a type of loudspeaker that is capable 
of generating a precise, sharp image when a common 
signal is applied to a pair of loudspeakers spaced at 
about 60°. The monitoring room also must be 
restrained in the effect it has on the sound. In this 
context, the BBC's tendency to have very heavily 
treated monitoring rooms (reverberation time 
< = 0.2 sec) has proved to be advantageous. (It is 
also probably fortunate that the BBC has, in the main, 
so far avoided following the pop music fashion of 
Live-End-Dead-End acoustic design, but has instead 
continued to try to achieve an even distribution of 
acoustic treatment. Loudspeakers in various parts of 
the room are therefore likely to interact with the room 
in a similar way to one another.) 




Fig. 6 - Multichannel audio 
production in Sypher 2. 



(S-4) 




Fig. 7 - Sypher control room layout. 

Once a listening centre has been defined, the 
loudspeakers should be located as required. In neither 
of the two rooms used by the BBC so far, was this 
particularly easy. In the Sypher Suite, a large amount 
of technical apparatus behind the monitoring position 
had to be moved back towards the rear wall. Even 
then, the rear loudspeakers were mounted on a frame 
over a bank of tape recorders. In the case of TMS, a 
large encased ventilation duct bad been installed along 
the rear wall below the level of the false ceiling. The 
rear loudspeakers should ideally have been placed 
under this, on the basis of distance to the monitoring 
point, but this would have seriously affected their 
tonal quality. Two solutions were used to overcome 
this: on the first occasion, the signals to the rear 
loudspeakers were delayed to compensate for the 
distance error, whilst for the last session, all 
loudspeakers were moved closer to the monitoring 
position. (A third solution of opening the rear 'stage', 
such that the rear loudspeakers subtended an angle of 
about 120 degrees 8 , was not possible in either TMS or 
Sypher because of the location of technical equipment.) 
The final problem in the placement of loudspeakers is 
that ideally one would wish to have the centre front 
loudspeaker and the picture monitor co-located. For 
the mixing rooms, the loudspeaker has always been 
allowed to take pride of place, with the picture offset. 
For reproduction or demonstration rooms, the picture 
must be placed centrally with the centre channel either 
above or below it. Additionally for HDTV CRT 
displays, care must be taken to screen the display from 
the stray magnetic flux of the loudspeakers; this may 
on occasions require the encapsulation of the toroidal 
magnets in the loudspeakers. 

Balancing the levels of the loudspeakers is a 
straight extension of the technique used for stereo. A 
common signal is fed in turn to each adjacent pair of 
loudspeakers and the sensitivity of one of them is 
adjusted to create a central image. The advantage of 
the aural approach is that it is far more sensitive 



c 
.a 



<£> 00 ^ 



a 

In 



<£<■) 



2ir 



table 



bay 



\® 



ine of 
duct 



seating 



Fig. 8 - Control room of Music Studio. 

(accurate) than the alternative of a sound level meter: 
furthermore, sound supervisors are used to making 
aural judgements — they are not necessarily experts in 
the use of a sound level meter. It should also be noted 
that identical or closely similar loudspeakers should be 
used for all channels. It is misleading to use different 
(cheaper) loudspeakers for the rear channels, 
particularly at the sound balancing stage, where one 
may be tempted to alter the balance to compensate for 
some of the deficiencies of the monitoring system. 

Mixing desk routeing has required care, 
specifically because of the stereo organisation of most 
sound desks. Several solutions have been found, which 
may not be elegant but which do provide multichannel 
routeing on a stereo desk. Two other points are, 
however, worth further discussion here, namely 
automated mixing and panning. Experience would 
now indicate that automation is absolutely essential for 
post-production editing of surround sound. Considering 
the task of editing, one has somehow to carry the 
surround effect across each edit point with, on most 
occasions, no significant change at the edit point. For 
sports this is far from easy, simply because of the 
dramatic changes in crowd noise that are likely to 
occur at an edit. The solution is to fade-in the post- 
edit sound well before the edit and fade-out the pre- 
edit sound well after the edit; but even then, each edit 
may need several attempts before the effect is 
acceptable. With the number of channels involved, 
manual control of the whole event is virtually 
impossible, even for a skilled operator. 

Panning also requires great care if one is to use 
the three frontal channels and the surround channels 
to their best. The most clearly defined sound images* 
are those created by feeding audio signals to adjacent 

* In the author's opinion this is still a valid statement, but 
suggestions have been made that the use of all three front 
loudspeakers could create belter panned images. Whilsl this may 
well even out any variations in the quality of the images across the 
stage, it has still to be proven that sharper images can be created. 



(S-4) 



-5- 



loudspeakers. Thus to pan a sound from left to right, 
the sound must be cross-faded via the centre channel. 
Stereo desks are not organised to operate this way but 
can be forced to provide such a facility for non- 
varying, static location, panning. Dynamic panning 
will require some changes to the format of a normal 
stereo desk, and several workers in this field are 
already studying the problem. Not only has the 
principle of multichannel panning to be accom- 
modated, but it is necessary to determine the optimum 
panning law. Stereo panning has usually been 
provided by means of a sine/cosine relationship 
between the two channels, but there are already strong 
arguments 9 for different panning laws for surround 
sound and even laws involving more than two 
channels at once. However, it is predicted that, in the 
early stages of multichannel operations, at least, 
revised dynamic panning will be provided by means 
of an add-on box to a standard desk in order to 
minimise the financial impact. 

The other area of mixing and post-production 
where stereo devices can be adapted for multichannel 
working, is that of sound effects. To date, only stereo 
devices have been available and therefore there was no 
choice. Experience has shown however that multi- 
channel solutions would be better and easier to use. 
This comment applies not only to DSP devices such 
as artificial reverberators, delays, flangers, etc., but also 
to sound effects discs. The opening section of a tennis 
programme which the BBC produced, used three 
stereo sound effects discs to create the multi- 
dimensional ambience that was needed to accompany 
a high crane-mounted camera shot of the tennis club 
and its environment. Surround sound recordings 
would obviously have simplified the production. 

One essential change of facilities that will be 
needed, both for production and post-production areas, 
is the provision of multi-format monitor switching 10 . 
Current stereo productions include occasional switches 
to monophorjic monitoring for assessment of 
compatibility. Likewise the surround sound production 
will need to be monitored for compatible reproduction 
in stereo, three channel and any other formats that are 
considered to be representative of a reasonable 
proportion of the systems being used by the audience. 
Current methods of providing for this multi-format 
switching are cumbersome or expensive or both, even 
though the technology is in principle relatively simple. 
In the near future this will be most easily provided by 
an add-on box placed in the signal feeds between the 
desk output and the loudspeakers. In the longer term, 
it could become a standard extension of the monitor 
switching already provided in the sound desk. 

As already mentioned above, the other 
development that will ease the introduction of 



multichannel sound into the production environment, 
is the provision of HDTV recorders with multiple 
sound channels. The EBU has already recommended 
to various groups that such a recorder should have at 
least eight channels of audio, with more for those 
machines required to provide for track-bouncing or 
assembly editing. If radical increases in track numbers 
are required, or where track-laying for post-production 
work is being planned, then a separate synchronised 
audio recorder will continue to be used as now. 



4. EXPLOITATION OF MULTICHANNEL 
SOUND 

A great deal of experience by many broad- 
casters with Television Stereo, and a more limited 
amount of experience with multichannel sound for 
television, has led to an expectation of what can be 
achieved in the new era of HDTV Sound. To some 
extent it benefits from and matches the experience of 
the cinema industry, but in at least one specific aspect 
the conclusions differ. 

In the cinema industry, the programme maker 
has to cope with an extreme range of reproduction 
environments and listener placements. The cinema can 
vary from a small intimate 20/50 seat environment 
with very low reverberation, to a large auditorium 
seating many hundreds of people with a comparatively 
long reverberation. The viewing/ listening angle can 
vary over almost 1 80 degrees, and whatever the 
conditions, the producer has to try to generate spatial 
coincidence between the sound image and the visual 
image. This has led 11 to the use of the centre channel 
in the cinema sound system for dialogue, with 
virtually no dialogue from any other location. In the 
television industry however, the normal reproduction 
environment domestically will be much smaller, more 
intimate and less reverberant. The listening position 
will also be much less extreme than the worst 
positions in the cinema. This opens the horizons for 
the programme maker, who can at last contemplate 
the use of directional cues on the voices to add a new 
dimension to the production. This is already being 
seen as being valuable in television stereo and, indeed, 
some IMAX productions 12 and experiments into 
systems offering 'virtual reality' 13 are exploiting spatial 
sound techniques in an attempt to get closer to the 
real experience. Whilst 'virtual reality' will be a long 
time in coming to HDTV, the more limited spatial 
representation of dialogue will be seen to be 
increasingly important in HDTV programme pro- 
duction. (See also Fig. 9.) 

The way in which film sound tracks will be 
handled is itself a debatable matter. If surround sound 
encoding has already been implemented, say using the 



(S-4) 



comedy out-of- vision speech 

dramo out-of-vision speech/crowds 

drama in -vision speech 

o o 

out-of-vision commentator 

o 

game-show panelists 

<? o 

more than one presenter 





newsreader 




A 




A 




A 




screen 





Fig. 9 - Distribution of sound images. 

Dolby encoding format, why not just radiate those 
signals as they are? It should be remembered however 
that Dolby Surround is just one of a number of 
formats that exists already and from which the HDTV 
programmes will be sourced. There are also additional 
formats being developed and marketed 14 . These 
various film formats have to be brought to a common 
transmission format prior to broadcasting, if the 
reception decoding problems are to be reduced to 
sensible proportions. Additionally, it should be borne 
in mind that the film sound tracks are mixed to be 
compatible with a very wide picture and when the 
picture is reduced, even to HDTV size, there are going 
to be occasions where the sound stage, particularly for 
in- vision effects, is going to be too wide. By decoding 
the film sound formats to a common Moudpeaker-feed' 
format at the studio centre, not only can the best 
possible decoder be used in each case, at no cost to 
the end customer, but also, the sound presentation can 
be modified if necessary to suit the somewhat smaller 
screen. It is because of such optimisation matters, that 
the CCIR and EBU groups studying HDTV sound are 
recommending members to decode film sound tracks 
prior to transmission. 

It will be important in other ways as well, not 
to over-stretch the brain's desire to have sounds 
coming from all around the head. In real life one has 
the ability, when one's interest is aroused, to turn to 
face an event in order to apply one's sight to the 
interpretation of that event. If in surround sound such 
interest is generated, the turn of the head will only 
bring an empty wall into view. In this context, the 
sound system must enhance the visual imagery rather 
than take precedence, and whilst transient sources of 



sound could be located outside the visual image, such 
areas should normally be restricted to ambient sounds, 
see Fig. 10. 

Surrounding sounds have, however, always 
been found to be beneficial when correctly mixed, 
although this comment refers mainly to systems with 
two channels of surround. For most of the time, a 
single channel of surround, whether fed to one or 
more loudspeakers, is perceived as an independent 
source with a precise location of its own. This is due 
to a combination of the spatial angular displacement 
between this loudspeaker(s) and the front ones, and 
the different directional properties of the hearing 
system for sources from the rear. Particularly with 
crowd noises, the impression given is that there is a 
large number of people all in one spot, rather than 
spread across the rear. But it should be noted that 
'more is not necessarily better'. Some experimenters, 
whilst stating that 'a minimum of two (loudspeakers) 
is needed', found that four gave problems with the 
timbre of the surround sound 15 . Thus, at least two 
channels of surround sound will be needed, but the 
extent of the benefit of the full surround sound 
presentation over the multichannel frontal presentation 
will always depend on the skills of the programme 
maker and on the type of programme 16 ' 17 . 

As the extent of HDTV coverage increases, or 
even to ensure that it does, there will be an increasing 
demand for multiple language options. For major 
sports events (witness the Eureka 95 HDTV relays of 
the 1990 World Cup football matches and the plans 
for the 1992 Olympic Games) it will be essential to 
provide multiple language commentary facilities. This 
has in any case been foreseen at the planning stages of 
such systems as D-MAC. There are arguably also 
strong cases for the provision of at least one extra 
language for films and co-productions, viz. the original 



1 direct sources 

2 reverberation 

3 effects 



2+3 




2+3 



-2 + 3- 



Fig. JO - Preferred location of sound sources. 



(S-4) 



-7- 



Table 1: Estimates of the number of adults in Great Britain arranged into different hearing loss categories* 



Description of hearing 


dBHL better 


Number of adults 


% of total adult 


loss (BSA categories) 


ear average 


(millions) 


population 


Mild 


25-40 


5.0 


11.33 


Moderate 


41-70 


2.2 


4.99 


Severe 


71-95 


0.24 


0.54 


Profound 


96 + 


0.06 


0.14 


Total 




7.5 


17.00 



Based on data provided by the Royal National Institute for the Deaf and the Institute of Hearing Research. 



language and an over-dubbed language for the 
majority of listeners in the particular country of replay. 
Such provisions are relatively simple for sports, where 
the commentators are an adjunct to the event rather 
than part of the action being reproduced. The 
consequence for sports is one extra channel per 
language, even if surround sound is being radiated. In 
virtually all other types of programme, i.e. for all 
dialogue other than commentary, the production is 
attempting to place the spoken word into the acoustic 
environment of the programme. This requires a full 
mix for each language; three languages in stereo would 
require six sound channels; two languages in surround 
sound would require ten channels, etc. Thus multiple 
language working might in principle be a worthy 
cause, but it could be expensive for many programmes. 

Additional services, such as clean dialogue for 
the Hard of Hearing (HoH) and dynamic range 
control, should also be seen as both potential 
improvements in HDTV sound and potential con- 
sumers of the data capacity that is available in HDTV 
for sound applications. The HoH channel is however 
one that should be very carefully considered 18 . It is 
estimated that upwards of 17% of the population of 
such countries as the UK have some loss of hearing 
(see Table 1) and that with age the relative percentage 
increases. It should also be recognised that with 
hearing loss comes the problem of dynamic range 
accommodation and difficulties in distinguishing 
between wanted and unwanted sound in a complex 
mix 19 . If capacity can be made available for a HoH 
sound channel (ideally, by reallocating a commentary 
channel for this purposef), then the otherwise 
disenfranchised HoH listeners would benefit 
enormously. The cost for the programme maker would 
be the cost of providing clean dialogue (it already 



exists in the early stages of many programme mixes) 
and of conveying it to the transmitter. 



5. COMPATIBILITY ISSUES 

As mentioned above, the artistic requirements 
of compatibility frequently require changes to be made 
to the surround sound mix, specifically because of the 
sound balances achieved in three channel or stereo 
presentations. The concert recording, above, though 
good in surround with the audience noise, had to be 
changed because of the stereo reproduction. Initial 
programme balances for football and tennis have had 
to be modified because of image location problems in 
stereo. The amount of sound energy, e.g. crowd noise, 
in the surround channels has to be tempered, on 
occasions, because it would be overpowering in the 
stereo mix. 

But not all programmes need such surround 
channel attenuation in stereo, even if they can tolerate 
it. Drama, for instance, may need the full strength of 
the rear channels to be retained in the compatible 
presentations. Football crowd noises may need con- 
siderable attenuation, whilst concert hall ambience 
may need very little. There may need to be a 
mechanism for changing the parameters of the 
baseband compatibility matrix to suit the programme 
type, but for obvious operational reasons this should 
be avoided if at all possible. 

To date, various compatibility compromises 
have been deduced during the mixing sessions, as 
shown in Table 2. As can be seen, the level changes 
required for compatible reproduction do vary from 
one programme type to another but probably not by a 



J Whilst the data capacity of a commentary channel is arguably too great for the needs of the hard of hearing channel and the use of sophisticated 
bit-rate reduction techniques could substantially reduce the data requirements of the channel, the fact remains that there is likely to be much 
more pressure to standardise on a receiver design that will cope with a commentary channel than there will be pressure for a HoH channel. Thus 
by adopting a commentary (or language) channel for the HoH channel, the only special feature required by the HoH is a data flag to tell the 
receiver which commentary channel holds the sound lor the HoH. This flag facility is identical in principle to that needed for French, German or 
other languages. Thus the receiver hardware for the HoH will not be special, and therefore could be expected 1o be relatively inexpensive. 



(S-4) 



■8- 



Table 2: Experimentally deduced attenuations of 3/2 surround sound signals in different forms of sound presentation. 



Programme 


5-ch to 4-ch 


5-ch to 3-ch 


5-ch to 2-ch 


Football and Tennis 


S' = Sl+S2-6dB 


L, C, R unchanged 
S1,S2 unused 


L' 
R' 


= L +(C-6 dB)+(Sl+S2-9 dB) 
= R +(C-6 dB)+(SJ +S2-9 dB) 


Promenade Concert 1 


S' = S1+S2-X dB 
x = 6 for audience 
x = 4 for ambience 


C unchanged 
L' = L +S1 
R' = R+S2 


L' 
R' 


= L +(C-6 dB)+(Sl-6 dB) 
= R+(C-6 dB)+(S2-6 dB) 


Promenade Concert 2 


Not assessed 


C unchanged 

L' = L +(Sl-4 dB) 

R' = R+(S2-4dB) 


L' 
R' 


= L+{C-3dB)+(Sl-6dB) 
= R +(C-3 dB)+(S2-6 dB) 



very significant amount. It has thus been argued that 
sound monitoring should encompass all of the various 
reproduction formats in order to guarantee artistic 
compatibility of the sound balance. 

But such compatibility monitoring cannot be 
done in isolation from, nor in ignorance of, the 
transmission system. In the case of normal stereo, a 
strict transmission relationship exists between the 
stereo signals and the mono signal; in programme 
mixing and monitoring that same relationship is used. 
So it will be for multichannel sound systems of the 
future; for instance, there will be strict relationships 
between five channel surround sound, three channel 
frontal and two channel stereo. To a great extent it 
will be the artistic compatibility requirements that will 
determine those relationships, but there may be 
additional transmission-dependent factors that must 
also be taken into account. For instance, artistic 
requirements have already been seen to be satisfied by 
a compatibility matrix of the form shown in 
Appendix l 20, 21 . This, however, requires a reverse 
matrix in the receiver in order to derive the 
loudspeaker signals for each form of reproduction. 
Such matrixing may not be compatible with some of 
the higher ratios of bit-rate reduction based on psycho- 
acoustic principles, which are being proposed 
for transmission in, for instance, DAB and 
HD-MAC 22, 23 ' 24 . In this case, alternative proposals 
for the transmitted signals have been tabled 25 ; namely 
loudspeaker signals for the most complex member of a 
transmission hierarchy (say 3/2 surround sound), 
whilst the loudspeaker signals for the lesser complex 
reproduction formats (say stereo) are derived in the 
receiver by downmixing 26 (see Appendix 2). The 
advantage of this, for the programme maker, is that 
different constraints can be applied to the generation 
of signals for each form of reproduction (for instance 
more/less attenuation of the rear sound signals). 
However, it is obviously even more important in such 



a system that there be a standard method of 
downmixing, such that the production staff can be 
assured that they are hearing the same thing as each of 
their groups of listeners. 

The other aspect of compatibility, required of 
any transmission format or matrix, is compatibility 
with different forms of programme origination. Any 
broadcaster with a library of existing programmes is 
going to want to be able to continue to exploit that 
library, regardless of the fact that yesterday's pro- 
grammes were made with a different sound presenta- 
tion in mind than that proposed for tomorrow (see 
Fig. 11). In other words, old programmes with, say, 
stereo sound have still got to make sense when broad- 
cast through a multichannel sound transmission system. 
That is one of the additional features that went into the 
derivation of the compatibility matrix of Appendix 1. 



source 

mono 

stereo 

three channel 

four channel 

five channel 



source 

mono 

stereo 

three channel 

four channel 

five channel 




multiple receiver choice 




receive 

mono 

stereo 

three channel 

four channel 

five channel 



receive 

mono 

stereo 

three channel 

four channel 

five channel 



multiple source choice 
Fig. 11 ■ The multiple opportunities of a compatibility matrix. 



(S-4) 



-9- 



There are also spatial compatibility benefits at 
the production stage in the use of multiple channels of 
sound, just as much as for the listener at home. The 
change from two channels (stereo) to three channels at 
the front has been found to be particularly worthwhile. 
As stated for the home listener, the centre channel 
adds a great deal of stability and sharpness to the 
frontal sound images for a wide range of listening 
positions 27, 2a ' 29 . No longer do the sounds move 
rapidly to the closer loudspeaker as the listening 
position moves away from the precise centre line; with 
three channels the shift in image is dramatically 
reduced. This is a particular bonus to the sound 
supervisor, who has, of necessity, to move around by 
significant distances whilst mixing the programme. 

Thus, it can be seen that there is much to be 
gained by ensuring that the sound system oF the future 
offers compatibility in a number of ways. But that 
compatibility, as has been seen, makes demands on the 
transmission system. 



6. CONCLUSIONS 

Developments in HDTV sound are proceeding 
apace and various international committees and groups 
are studying some of the more theoretical aspects of 
the subject. Equally important, are the experiments 
reported here on the programme production work, 
and the multifaceted problem of compatibility. Existing 
facilities have been shown to be adaptable to the 
needs of multichannel sound production; but 
undoubtedly, purpose-designed facilities would be 
better. The requirements of compatibility with existing 
audiences always occur with new services, but there 
are solutions available. Other aspects of the system 
design, particularly the transmission constraints, will 
also ultimately influence the development of the 
compatible multichannel sound system for HDTV. 



7. ACKNOWLEDGEMENTS 

The author would like to thank his many 
colleagues throughout the BBC for their contributions 
to the work recorded here. 



8. REFERENCES 

1. MEARES, D.J. 1991. Developments in multi- 
channel sound for HDTV. BBC Research 
Department Report No. 1991/13. 

2. CCIR. Suitable sound systems to accompany 
high-definition and enhanced television systems. 
Report 1072. Draft revision 1989. Doc. 10/369. 



3. WATERS, G.T. et al. 1990. Sounds of the 
future. EBU Review - Technical. No. 241-242, 
June/ August 1990, pp 58-69. 

4. GOODSON, L. 1991. Microphone selection and 
balance techniques for television, stereo and 
surround sound. Proceedings of the AES 9th 
International Conference 'Television Sound, 
Today and Tomorrow'. Detroit, Michigan, 
1-2 February 1991, pp 105-118. 

5. MEARES, D.J. 1991. High definition sound for 
HDTV. Proceedings of the AES 9th International 
Conference 'Television Sound, Today and 
Tomorrow', Detroit, Michigan, 1-2 February 
1991, pp 187-215. 

6. JOHNSON, M. 1991. Digital audio in TV 
broadcasting. Proceedings of the AES 9th 
International Conference 'Television Sound, 
Today and Tomorrow', Detroit, Michigan, 1-2 
February 1991, pp 53-57. 

7. HAMASAKI, K. How to handle sound with 
large screens. Proceedings of the Broadcast 
Sessions of 17th International Television 
Symposium, Montreux, Switzerland, 13-18 June 
1991, pp 649-671. 

8. MEARES, D.J. 1990. Can sound be high 
definition? InterBEE Symposium on 'Sound with 
Pictures'. Tokyo, 7 November 1990. 

9. GERZON, M.A. 1990. Three channels: the 
future of stereo? Studio Sound, June 1990, 
pp 1 12-125. 

10. MEARES, D.J. 1991. Multichannel sound 
systems for HDTV. International Colloquium 
'Auditory Virtual Environment and Telepresence'. 
Ruhr-Universitat, Bochum, 8 April 1991. [To be 
published in Applied Acoustics] 

11. ALLEN, I. 1991. Matching the sound to the 
picture. Proceedings of the AES 9th International 
Conference 'Television Sound, Today and 
Tomorrow', Detroit, Michigan, 1-2 February 
1991, pp 177-186. 

12. WALES, A. 1990. Sound and acoustics in the 
IMAX theatre. InterBEE Symposium on 'Sound 
with Pictures'. Tokyo, 7 November 1990. 

13. FURNESS, T.A. 1991. Keynote paper on virtual 
environment. Proceedings of the International 
Colloquium 'Auditory Virtual Environment and 
Telepresence'. Ruhr-Universitat, Bochum, 8 April 
1991. [To be published in Applied Acoustics]. 



(S-4) 



10- 



14. FLEMMING, H.J. 1991. Cinema Digital Sound: 
the technology. Proceedings of the BKSTS 12th 
International Conference. London, 9-12 July 
1991. 

15. HOLMAN T. 1990. New factors in sound for 
cinema and television. Proceedings of the 89th 
Convention of the Audio Engineering Society, 
22-25 September 1990. Preprint No. 2945. 

16. SCHULEIN, R.B. 1991. Television and audio/ 
video production techniques using the stereo- 
surround audio production format. Proceedings of 
the AES 9th International Conference 'Television 
Sound, Today and Tomorrow', Detroit, Michigan, 
1-2 February 1991, pp 151-161. 

17. SAWAGUSHI, M. et al. 1991. Surround 
broadcasting in Japan. Proceedings of the AES 
9th International Conference 'Television Sound, 
Today and Tomorrow', Detroit, Michigan, 
1-2 February 1991, pp 163-174. 

18. UK. 1991. Sound broadcasting for the hearing- 
impaired. CCIR Doc. 10/1-35. January 1991. 

19. CCIR. 1989. HDTV sound channel for the 
hearing impaired. CCIR Doc. 10/314-E 
(11/563-E). 2 October 1989. 

20. MEARES, D.J. 1990. HDTV baseband com- 
patibility matrixing. EBU Doc. No. V3/HTS-001. 
November 1990. 

21. Different constraints can lead to somewhat 
different matrices. See for instance: 

GERZON, M.A. Optimal reproduction matrices 
for multispeaker stereo. [To be published in the 
Proceedings of the 91st AES Convention, 
October 1991], 



22. THEILE, G. et al. 1988. Low bit rate coding of 
high-quality audio signals. An introduction to the 
MASCAM system. EBU Review - Technical 
No. 230, 1988. pp 158-181. 

23. BRANDENBURG, K.H. et al. 1988. Real-time 
implementation of low complexity transform 
coding. AES Preprint No. 2581. 

24. ten KATE, W.R., van de KERKHOF, L.M. and 
ZIJDERVELD, F.F. 1990. Digital audio carrying 
extra information. 15th International Conference 
on Acoustics, Speech and Signal Processing, 
April 1990. Paper No. 6.A1.2. 

25. THEILE, G. 1991. HDTV sound systems: how 
many channels? Proceedings of the AES 9th 
International Conference 'Television Sound, 
Today and Tomorrow', Detroit, Michigan, 
1-2 February 1991, pp 217-232. 

26. MEARES, D.J. 1991. Downward mixing 
of multichannel audio signals. 27 February 
1991. Contribution to CCIR draft document 
'Multichannel sound systems'. Draft CCIR Doc. 
No. TG 10/1-36. April 1991. 

27. MEARES, D.J. 1990. HDTV sound: permissible 
sources and desirable outputs. CCIR Doc. 
No. IWP 10/12-09, March 1990. 

28. THEILE, G. 1990. Further developments of 
loudspeaker stereophony. Proceedings of the 
89th Convention of the Audio Engineering 
Society, 21-25 September 1990. Preprint No. 
2947. 

29. KOMIYAMA, S. et al. 1990. The 3-1 
quadraphonic sound system and its application to 
an HDTV home receiver. NHK Laboratory 
Note, Serial No. 382. July 1990. 



(S-4) 



11 



APPENDIX 1 
Compatibility Matrixing 



A1.1 Introduction 



Multi-channel sound programmes may need matrixing for two reasons. Firstly, there will be a need to 
ensure that different modes of reception are possible, regardless of the mode of transmission. Secondly, the 
transmission bit-rate reduction may itself have need of certain relationships between the various sound channels. 

Within the hierarchy of systems being recommended by the CCIR, there are many individual methods of 
utilizing the sound channels. At the highest level, for cinema types of application, the 4/4 system provides 
potentially the highest quality of frontal sound image localization and the most natural ambience from the 
surround channels. At the lower levels, the hierarchy provides for those programme makers and listeners who want 
to make or listen to stereo or mono sound balances. It is the fundamental aim of a compatibility matrix to provide 
simple but controlled inter-relationships between these various modes of programme generation and programme 
reception. 

Thus, the matrix has to provide ways in which, say, a 3/2 broadcast can be received in 3/2 format or 3/0 
format etc. This is termed 'downward compatibility, and is discussed in this Appendix. 

It also has to enable the archives of existing programmes in stereo to be transmitted over the multi-channel 
system and to be received sensibly on the various formats of receiver. This is termed 'upwards conversion'' and 
whilst not specifically discussed in detail any further, provision is made for it in the equations in this Appendix. 

Finally, there may be circumstances where intrinsic downward compatibility of the transmitted signals is 
not required and discrete loudspeaker signals are transmitted, but where, nevertheless, it is required to generate 
loudspeaker signals at the receiver which relate to a lower order in the hierarchy, i.e. a 3/0 presentation of 3/2 
transmitted signals. Under these circumstances, a standardised form of 'downward mixing' equations will be needed 
in the receiver: these are given in Appendix 2. 

In order to identify specific loudspeakers and channel sources in a variety of arrangements, the codes given 
in Fig. AI.l have been used. For the 3/2 presentation, the five loudspeakers /channels are designated L, C, R, SL 
and SR. For the 4/4 presentation, the loudspeakers/channels are designated L, CL, CR, R, SL1, SL2, SRI and 
SR2. Lesser arrangements are obvious reductions from the above. 




Fig. AI.l - Loudspeaker layouts for surround sound 

A1.2 Downwards compatibility 

Concentrating first on the downward compatibility, a mathematical matrix provides for simple reception in 
the lower orders of the hierarchy when higher orders of programme are being broadcast. For instance, if a 3/2 
format programme is being broadcast the stereo receiver should only need to decode two transmission channels in 
order to drive the stereo loudspeakers. 

The same applies to nearly all combinations of transmission and receive mode. A simplification can, 

(S-4) -12- 



however, be made if there is a distinction between the auditoria formats of source and reception (seen here as 4/4, 
4/2, and 3/4) and the domestic formats (seen here as 3/2, 3/1, 3/0, etc.)- Undoubtedly, there is a need to be able 
to derive 3/2 transmission signals from, say, 4/4 production signals; but this derivation is considered a separate 
task from the transmission compatibility matrixing. 

We can also deal separately with the front production channels and the surround production channels, as 
any reduction will be for either one group of channels or the other. Thus, dealing first with the front channels, 
there is a need to derive three intermediate signals (L', R', C) from the four production signals (L, R, CL, CR). 
Two proposals have been considered, namely simple addition of CL and CR such that: 

C = CL + CR 
L' = L 

R' = R 

and more complex reduction such that: 

C = .866*CL + .886*CR 
L' = L + .500*CL 

R' - R + .500*CR 

Preliminary tests on these two proposals have demonstrated a significant shortcoming of the simple 
proposal, in that any sound sources spread between CL and CR are collapsed, in the reduced format, to come 
from a single location: thus, the distribution of static sound sources or the evenness of movement across the front 
sound stage, are adversely affected. On this basis, the more complex reduction is recommended for further study. 

Dealing secondly with the surround channels, there is no single obviously correct way of reducing from 
four channels (SL1, SL2, SRI, SR2) to two (SL, SR), as the requirements vary, depending on the contents of the 
four original surround signals. If these signals are four discrete sources, then simple addition is enough, such that: 

SL = ,707*(SL1 + SL2) 
SR = ,707*(SR1 + SR2) 

However, if, as has been suggested for some programmes, SL2 and SR2 have already been derived 
artificially from SL1 and SRI, then the reduction should only take account of the true surround material, such 
that: 

SL = SL1 
SR = SRI 

Thirdly, if the four source surround channels are derived from a coincident microphone arrangement, it 
may be necessary to use a weighted sum of the source channels: 

SL = kl*SLl + k2*SL2 + k3*SR2 
SR = kl'SRl + k2*SR2 + k3*SL2 

where kl, k2 and k3 are the weighting functions. The correct form of reduction will, therefore, depend on factors 
relating to the original signals; but experience may lead to a single reduction algorithm being found to be adequate. 

Having thus reduced the higher levels of the hierarchy to the 3/2 format, a compatibility matrix can be 
derived that achieves the aims already reported. It was with these aims in mind that the audio encode and decode 
matrices given in Table Al.l were derived. They comprise a group of equations that take the source (production) 
signals L, R, C, SL and SR, as they might come from a tape recorder, and combine them into five signals A, B, T, 
Ql and Q2, for conveyance to the transmission encoding/modulation circuitry for ultimate broadcasting. These 
encoding matrices provide the essence of compatible reception, as well as providing a reversible matrix for the 
surround listener. The decode equations are subdivided according to the style of the listening equipment. They 
provide for listening in mono (1/0), stereo (2/0), three channel (3/0), four channel (3/1) and five channel (3/2). 

(S-4) -13- 



Table A 1.1: Five channel surround encoding and decoding equations. 
Encoding equations 





L 


R 


c 


SL 


SR 


A — 


1.000 


.000 


.707 


.707 


.000 


B = 


.000 


1.000 


.707 


.000 


.707 


T = 


.000 


.000 


.707 


.000 


.000 


Ql = 


.000 


.000 


.000 


.707 


.707 


Q2 = 


.000 


.000 


.000 


.707 


-.707 



Decoding equations 



Mono — 1/0 format 




















A 


B 


T 


Qi 


Q2 


L 


R 


C 


SL 


SR 


M = .707 


.707 


.000 


.000 


.000 


.707 


.707 


1. 000 


.500 


.500 



Stereo — 2/0 format 

A B 

L' = 1.000 .000 



R' = 



.000 



1.000 



T 


Qi 


Q2 




L 


R 


c 


SL 


SR 


.000 


.000 


.000 


= 


1.000 


.000 


.707 


.707 


.000 


.000 


.000 


.000 


= 


.000 


1.000 


.707 


.000 


.707 



Three channels — 3/0 format 





A 


B 


T 


Ql 


Q2 




L 


R 


c 


SL 


SR 


L' = 


1.000 


.000 


-1.000 


.000 


.000 


= 


1.000 


.000 


.000 


.707 


.000 


R' = 


.000 


1.000 


-1.000 


.000 


.000 


— 


.000 


1.000 


.000 


.000 


.707 


C = 


.000 


.000 


1.414 


.000 


.000 


= 


.000 


.000 


1.000 


.000 


.000 



Four channels — 3/1 format 







A 


B 


T 


Qi 


Q2 




L 


R 


c 


SL 


SR 


L' 


= 


1.000 


.000 


-1.000 


-.500 


.000 


= 


1.000 


.000 


.000 


.354 


-.354 


R' 


= 


.000 


1.000 


-1.000 


-.500 


.000 


= 


.000 


1.000 


.000 


-.354 


.354 


C 


= 


.000 


.000 


1.414 


.000 


.000 


= 


.000 


.000 


1.000 


.000 


.000 


S' 


= 


.000 


.000 


.000 


1.000 


.000 


= 


.000 


.000 


.000 


.707 


.707 



Five channels — 3/2 format 





A 


B 


T 


Qi 


Q2 


L 


R 


c 


SL 


SR 


L' = 


1.000 


.000 


-1.000 


-.500 


-.500 


1.000 


.000 


.000 


.000 


.000 


R' = 


.000 


1.000 


-1.000 


-.500 


.500 


.000 


1.000 


.000 


.000 


.000 


C = 


.000 


.000 


1.414 


.000 


.000 


.000 


.000 


1.000 


.000 


.000 


SL' = 


.000 


.000 


.000 


.707 


.707 


.000 


.000 


.000 


1.000 


.000 


SR' = 


.000 


.000 


.000 


.707 


-.707 


.000 


.000 


.000 


.000 


1.000 



(S-4) 



14- 



Looking at the encode matrix, it has been arranged in tabular form. Thus the first line represents the 
equation: 

A = 1.000*L + 0.000*R + 0.707*C + 0.707*SL + 0.000*SR 

The decode matrix is similarly arranged, except that, having presented the appropriate equations for 
combining the transmitted signals A, B, T, Ql and Q2, substitution is then made in order to show what the result 
is in terms of the original source signals L, R, C, SL and SR. 

One aspect of the encode and decode matrix that becomes apparent with further analysis, is that, in some 
modes of reproduction, there is a variation in reproduced sound power level depending on source location. It 
should be recollected, however, that the same already applies to the M and S matrixing in stereo, which gives rise 
to a 3 dB variation. In this case, the variation is given in Fig. A1.2 for the specific case of the equations of Table 
A 1.1 and sine/cosine panning of the sound source around the surround stage. The figure shows, for a five channel 
source (3/2 format), the sound power variation for reproduction in mono (1/0), stereo (2/0), three (3/0) and four 
(3/1) channel formats, for the different directions of centre front, left front, left surround and the intermediate 
locations. The decoding to five channels (3/2) is perfect, in that there is no variation of sound power level. 



m 
o 

Q. 




centre 
front 



left 
surround 



sound source location 



Mono 
Stereo 
3-ch 
4-ch 



Fig. A 1.2 - Audio compatibility matrix. Reproduced power 
variation: 5-ch surround sound. 



(S-4) 



15- 



APPENDIX 2 
Downward Mixing of Multichannel Audio Signals 

A2.1 Introduction 

When stereo radio was being defined it was recognised that provision had to be made for the compatible 
reception of the signals in both mono and stereo. Thus, it was decided to transmit the signals: 

M = (L+R)/2 and S = (L-R)/2 

such, that simple mono reception could continue, whilst stereo was provided by the two signals and some simple 
decoding. The same is true today, with multichannel sound for either television or radio, that several forms of 
reception have to be accommodated, and thus compatibility matrixing has been proposed. 

This assumes that there is a reason, such as compatibility with existing transmission formats or cheaper 
receivers, for providing the matrix at the transmission end of the broadcast chain. In the context of some of the 
bit-rate reduction proposals now being developed, it also has the drawback of requiring sum and difference 
matrixing at the receiver to re-establish the loudspeaker feed signals. This can, under some circumstances, expose 
the limitations of the bit-rate reduction systems, and hence fresh proposals are being tabled in CC1R and EBU 
working parties. These proposals suggest the transmission of loudspeaker feed signals for the highest member of the 
sound hierarchy, say 3/2 surround sound, with proportional summation being used to generate loudspeaker signals 
for lower members of the hierarchy, say a 3/0 presentation. This proportional summation is termed 'downward 
mixing'. 

A2.2 The proposal 

The downward mixing equations are presented in two tables, which refer respectively to 3/2 source 
material and 4/4 source material. They are arranged, in tabulated matrix form, to show reproduction equations for 
a number of loudspeaker arrangements. 

Table A2.1 gives the downward mixing equations for 3/2 source material, based on the equations used for 
compatibility matrixing. It shows reproduction in formats such as mono (1/0), stereo (2/0), 3/0, 3/1 and 2/2. It 
should be remembered that its derivation was based to some extent on the subject appraisal of 3/2 material in 
such formats. It is relevant to note, that there is deliberate attenuation of the surround source channels in formats 
with reduced numbers of loudspeakers, to reduce the level of ambient sounds in the resulting mixes. Thus, in 
mono, the surround sources are attenuated by 6 dB, and in stereo, 3/0, 2/1 and 3/1 formats by 3 dB. Also, the 
surround information is panned to optimise the subjective impression given. 

The downward mixing equations for 4/4 source material are shown in Table A2.2 for mono, stereo, 3/0, 
4/0, 3/2, and 4/2 presentations etc. Their derivation follows the principles used for the 3/2 source material above. 
Again, there is attenuation of the surround sources in some of the reproduction modes. These are 6 dB in mono, 
stereo, 3/0 and 4/0 format; 3 dB in 2/1 and 3/1 format; dB in 2/2, 3/2, and 4/2 formats. The panning of 
signals has been derived on the basis that the phantom images for loudspeaker signals in, say, stereo should be 
coincident with the 'missing' loudspeakers. Thus, in stereo, CL and CR sources are panned to positions of ±10 
degrees from centre front. Other panned sources are derived on the same basis. 



(S-4) - 16 ■ 



Table A2. 1: Downward mixing equations for 3/2 source material. 



Mono — 1/0 format 










L 


R 


c 


SL 


SR 


C = .707 


.707 


1.000 


.500 


.500 


Stereo — 2/0 format 










L 


R 


c 


SL 


SR 


L' = 1.000 


.000 


.707 


.707 


.000 



.000 



1.000 



.707 



.000 



.707 



Three channels — 


3/0 format 










L 


R 


c 


SL 


SR 


L' = 


1.000 


.000 


.000 


.707 


.000 


R' = 


.000 


1.000 


.000 


.000 


.707 


C - 


.000 


.000 


1.000 


.000 


.000 



Three channels — 2/1 format 

L R 

L' = 1.000 .000 

R' = .000 1.000 

S' = .000 .000 



c 


SL 


SR 


.707 


.000 


.000 


.707 


.000 


.000 


.000 


.707 


.707 



Four channels — 3/1 format 





L 


R 


c 


SL 


SR 


L' = 


1.000 


.000 


.000 


.000 


.000 


R' = 


.000 


1.000 


.000 


.000 


.000 


C = 


.000 


.000 


1.000 


.000 


.000 


S' = 


.000 


.000 


.000 


.707 


.707 



Four channels — 2/2 format 





L 


R 


c 


SL 


SR 


L' = 


1.000 


.000 


.707 


.000 


.000 


R' = 


.000 


1.000 


.707 


.000 


.000 


SL' = 


.000 


.000 


.000 


1.000 


.000 


SR' = 


.000 


.000 


.000 


.000 


1.000 



(S-4) 



■17- 



Table A2.2: Downward mixing equations for 4/4 source material 



Mono — 1/0 format 
















L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


C = .707 


.707 


.707 


.707 


.354 


.354 


.354 


.354 


Stereo — 2/0 format 
















L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 1.000 


.000 


.866 


.500 


.500 


.000 


.433 


.250 


R' = .000 


].000 


.500 


.866 


.000 


.500 


.250 


.433 



Three channels — 3/0 format 





L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.500 


.000 


.500 


.000 


.250 


.000 


R' = 


.000 


1.000 


.000 


.500 


.000 


.500 


.000 


.250 


C = 


.000 


.000 


.866 


.866 


.000 


.000 


.433 


.433 



Three channels — 2/1 format 





L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.866 


.500 


.000 


.000 


.000 


.000 


R' = 


.000 


1.000 


.500 


.866 


.000 


.000 


.000 


.000 


S' = 


.000 


.000 


.000 


.000 


.707 


.707 


.707 


.707 



Four channels — 4/0 format 





L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.000 


.000 


.500 


.000 


.000 


.000 


R r = 


.000 


1.000 


.000 


.000 


.000 


.500 


.000 


.000 


CL' = 


.000 


.000 


1. 000 


.000 


.000 


.000 


.500 


.000 


CR' = 


.000 


.000 


.000 


1.000 


.000 


-ooo 


.000 


.500 



Four channels — 2/2 format 





L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.866 


.500 


.000 


.000 


.000 


.000 


R' = 


.000 


1.000 


.500 


.866 


.000 


.000 


.000 


.000 


SL' = 


.000 


.000 


.000 


.000 


1.000 


.000 


.866 


.500 


SR' = 


.000 


.000 


.000 


.000 


.000 


1.000 


.500 


.866 


Four channels 


— 3/1 format 
















L 


R 


CL 


CR 


SLl 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.500 


.000 


.000 


.000 


.000 


.000 


R' = 


.000 


1.000 


.000 


.500 


.000 


.000 


.000 


.000 


C = 


.000 


.000 


.866 


.866 


.000 


.000 


.000 


.000 


S' = 


.000 


.000 


.000 


.000 


.707 


.707 


.707 


.707 



(S-4) 



18- 



Table A2. 2: Downward mixing equations for 4/4 source material (conl.). 



Five channels 


— 3/2 format 
















L 


R 


CL 


CR 


SL1 


SRI 


SL2 


SR2 


L' = 


1.000 


.000 


.500 


.000 


.000 


.000 


.000 


.000 


R r = 


.000 


1.000 


.000 


.500 


.000 


.000 


.000 


.000 


C = 


.000 


.000 


.866 


.866 


.000 


.000 


.000 


.000 



SL' = .000 .000 .000 .000 1.000 .000 .866 .500 

SR' = .000 .000 .000 .000 .000 1.000 .500 .866 



Six channels — 4/2 format 














L R 


CL 


CR 


SL1 


SRI 


SL2 


SR2 


L' = 1.000 .000 


.000 


.000 


.000 


.000 


.000 


.000 


R' = .000 1.000 


.000 


.000 


.000 


.000 


.000 


.000 



CL' = .000 .000 1.000 .000 .000 .000 .000 .000 

CR' = .000 .000 .000 1.000 .000 .000 .000 .000 

SL' = .000 .000 .000 .000 1.000 .000 .866 .500 

SR' = .000 .000 .000 .000 .000 1.000 .500 .866 



(S-4) -19- 



Printed by BBC RESEARCH DEPARTMENT, Kinoswood Warren, Tadworth, Surrey, KT20 6NP 



